[HBASE-25357] allow specifying binary row key range to pre-split regions#72
[HBASE-25357] allow specifying binary row key range to pre-split regions#72Dieken wants to merge 1 commit intoapache:masterfrom
Conversation
|
💔 -1 overall
This message was automatically generated. |
3244622 to
96bfc39
Compare
|
🎊 +1 overall
This message was automatically generated. |
|
@Dieken please create a Jira for this change if you want to get it merged. Thank you! |
For example, the row key may start with a long integer, we can specify
ranges to pre-split regions:
```
import java.nio.charset.StandardCharsets;
import org.apache.hadoop.hbase.util.Bytes;
df.write()
.format("org.apache.hadoop.hbase.spark")
.option(HBaseTableCatalog.tableCatalog(), catalog)
.option(HBaseTableCatalog.newTable(), 5)
.option(HBaseTableCatalog.regionStart(), new String(Bytes.toBytes(0L), StandardCharsets.ISO_8859_1))
.option(HBaseTableCatalog.regionEnd(), new String(Bytes.toBytes(2000000L), StandardCharsets.ISO_8859_1))
.mode(SaveMode.Append)
.save();
```
96bfc39 to
41f2156
Compare
|
| parameters.get(HBaseTableCatalog.regionEnd) | ||
| .getOrElse(HBaseTableCatalog.defaultRegionEnd)) | ||
| val startKey = parameters.get(HBaseTableCatalog.regionStart) | ||
| .getOrElse(HBaseTableCatalog.defaultRegionStart).getBytes(StandardCharsets.ISO_8859_1) |
There was a problem hiding this comment.
I'm not sure it is a good idea to use different encoding from the default used by Bytes util converter (StandardCharsets.UTF_8), as many pieces of hbase code would rely on the Bytes converter, comparisons may become inconsistent.
Also, why you are using a different converter here, can you elaborate better what is the issue you are having within the builtin Bytes converter?
There was a problem hiding this comment.
The spark option use string to pass parameters, not support directly passing bytes,I need pass binary row key so I have to interpreter binary bytes as ISO_8859_1 encoded String, it’s not valid UTF-8.
It’s a trick, does break backward compatibility for UTF-8 string containing characters beyond ISO_8859_1 charset, the UTF-8 string must be wrapped as explained in the JIRA issue.
I can’t figure out better way to pass bytes in spark option.
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
For example, the row key may start with a long integer, we can specify
ranges to pre-split regions: